Discovering User Attribute Stylistic Differences via Paraphrasing
نویسندگان
چکیده
User attribute prediction from social media text has proven successful and useful for downstream tasks. In previous studies, differences in user trait language use have been limited primarily to the presence or absence of words that indicate topical preferences. In this study, we aim to find linguistic style distinctions across three different user attributes: gender, age and occupational class. By combining paraphrases with a simple yet effective method, we capture a wide set of stylistic differences that are exempt from topic bias. We show their predictive power in user profiling, conformity with human perception and psycholinguistic hypotheses, and potential use in generating natural language tailored to specific user traits.
منابع مشابه
Paraphrasing for Style
We present initial investigation into the task of paraphrasing language while targeting a particular writing style. The plays of William Shakespeare and their modern translations are used as a testbed for evaluating paraphrase systems targeting a specific style of writing. We show that even with a relatively small amount of parallel training data, it is possible to learn paraphrase models which...
متن کاملDiscovering Stylistic Variations in Distributional Vector Space Models via Lexical Paraphrases
Detecting and analyzing stylistic variation in language is relevant to diverse Natural Language Processing applications. In this work, we investigate whether salient dimensions of style variations are embedded in standard distributional vector spaces of word meaning. We hypothesize that distances between embeddings of lexical paraphrases can help isolate style from meaning variations and help i...
متن کاملA Controlled Language Aproach to Text Optimisation in Technical Documentation
In this paper we propose a controlled language approach to text optimisation in the field of technical documentation. Within this approach, we use stylistic paraphrases as instrument to the optimisation process. We present various categories of paraphrasing principles and describe their implementation in the corrector component of a controlled language checker.
متن کاملData-driven Paraphrasing and Stylistic Harmonization
This thesis proposal outlines the use of unsupervised data-driven methods for paraphrasing tasks. We motivate the development of knowledge-free methods at the guiding use case of multi-document summarization, which requires a domain-adaptable system for both the detection and generation of sentential paraphrases. First, we define a number of guiding research questions that will be addressed in ...
متن کاملReferring Expression Generation Using Speaker-based Attribute Selection and Trainable Realization (ATTR)
In the first REG competition, researchers proposed several general-purpose algorithms for attribute selection for referring expression generation. However, most of this work did not take into account: a) stylistic differences between speakers; or b) trainable surface realization approaches that combine semantic and word order information. In this paper we describe and evaluate several end-to-en...
متن کامل